<!DOCTYPE html>
<html lang="si">
<head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <meta name="description" content="මෙය PyTorch හි ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් ක්රියාත්මක කිරීම/නිබන්ධනයකි."/>

    <meta name="twitter:card" content="summary"/>
    <meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
    <meta name="twitter:title" content="ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර්"/>
    <meta name="twitter:description" content="මෙය PyTorch හි ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් ක්රියාත්මක කිරීම/නිබන්ධනයකි."/>
    <meta name="twitter:site" content="@labmlai"/>
    <meta name="twitter:creator" content="@labmlai"/>

    <meta property="og:url" content="https://nn.labml.ai/transformers/feedback/index.html"/>
    <meta property="og:title" content="ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර්"/>
    <meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
    <meta property="og:site_name" content="ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර්"/>
    <meta property="og:type" content="object"/>
    <meta property="og:title" content="ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර්"/>
    <meta property="og:description" content="මෙය PyTorch හි ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් ක්රියාත්මක කිරීම/නිබන්ධනයකි."/>

    <title>ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර්</title>
    <link rel="shortcut icon" href="/icon.png"/>
    <link rel="stylesheet" href="../../pylit.css?v=1">
    <link rel="canonical" href="https://nn.labml.ai/transformers/feedback/index.html"/>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">

    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
    <script>
        window.dataLayer = window.dataLayer || [];

        function gtag() {
            dataLayer.push(arguments);
        }

        gtag('js', new Date());

        gtag('config', 'G-4V3HC8HBLH');
    </script>
</head>
<body>
<div id='container'>
    <div id="background"></div>
    <div class='section'>
        <div class='docs'>
            <p>
                <a class="parent" href="/">home</a>
                <a class="parent" href="../index.html">transformers</a>
                <a class="parent" href="index.html">feedback</a>
            </p>
            <p>
                <a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations" target="_blank">
                    <img alt="Github"
                         src="https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
                         style="max-width:100%;"/></a>
                <a href="https://twitter.com/labmlai" rel="nofollow" target="_blank">
                    <img alt="Twitter"
                         src="https://img.shields.io/twitter/follow/labmlai?style=social"
                         style="max-width:100%;"/></a>
            </p>
            <p>
                <a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/feedback/__init__.py" target="_blank">
                    View code on Github</a>
            </p>
        </div>
    </div>
    <div class='section' id='section-0'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-0'>#</a>
            </div>
            <h1>ප්රතිපෝෂණට්රාන්ස්ෆෝමර්</h1>
<p>මෙය <a href="https://pytorch.org">PyTorch</a> ක්රියාත්මක කිරීම කඩදාසි <a href="https://papers.labml.ai/paper/2002.09402">ප්රතිපෝෂණ මතකය සමඟ අනුක්රමික ට්රාන්ස්ෆෝමර්වල ඉහළ මට්ටමේ නිරූපණයන් වෙත ප්රවේශ වීම</a> . </p>
<p>සාමාන්යට්රාන්ස්ෆෝමර් සමාන්තරව ටෝකන සකසනවා. සෑම ට්රාන්ස්ෆෝමර් ස්ථරයක්ම පෙර ස්ථරයේ ප්රතිදානයන් කෙරෙහි අවධානය යොමු කරයි. ප්රතිපෝෂණ ට්රාන්ස්ෆෝමරය පෙර පියවරයන්හි සියලුම ස්ථරවල ප්රතිදානය කෙරෙහි අවධානය යොමු කරයි. එබැවින් මෙය පුනරාවර්තනය එකතු කරන අතර, අපි ටෝකන්-විසින්-ටෝකන් සැකසිය යුතුය. මෙය පුහුණුව සැලකිය යුතු ලෙස මන්දගාමී වේ (අනුක්රමයේ දිග අනුව 5X - 10X පමණ). කෙසේ වෙතත්, ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් පුරෝකථනය කිරීමේදී වේගවත් වන්නේ ඔබ මතක දෛශික හැඹිලි කළහොත් ඊළඟ ටෝකනය පුරෝකථනය කළ හැකි බැවිනි. </p>
<p>පුහුණුවවේගවත් කිරීම සඳහා, කඩදාසි සාකච්ඡා කරන්නේ කෙටි අනුක්රමික දිගකින් ආරම්භ කර එය ක්රමයෙන් වැඩි කිරීමයි. ආරම්භක ස්ථානය ලෙස පෙර පුහුණු සමාන්තර ට්රාන්ස්ෆෝමරයක් භාවිතා කිරීම ද ඔවුහු සාකච්ඡා කරති. </p>
<p>මුල්ප්රතිපෝෂණ ට්රාන්ස්ෆෝමරය සියලු ස්ථරවල ප්රතිදානයන් තබා නොගනී. ඒ වෙනුවට එය සියලු ස්ථරවල නිමැවුමේ බර තැබූ එකතුව තබා ගනී. මෙය අනාවැකිය තුළ හැඹිලි සඳහා භාවිතා කරන මතකය අඩු කරයි. මෙම ගොනුවේ පළමු භාගය මෙය ක්රියාත්මක කරයි. </p>
<p>යාවත්කාලීනකරන ලද ප්රතිපෝෂණ ට්රාන්ස්ෆෝමරය බර බෙදා ගන්නා <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.132216em;vertical-align:-0.2831079999999999em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-2.4168920000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.01968em;">l</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2831079999999999em;"><span></span></span></span></span></span></span></span></span></span></span> අතර ස්ථර අතර යතුරු සහ අගයන් ගණනය කිරීමට <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.096108em;vertical-align:-0.247em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-2.4530000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">v</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.01968em;">l</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span></span></span></span> භාවිතා කරයි. ඉන්පසු අපි එක් එක් පියවර සඳහා යතුරු සහ අගයන් එක් වරක් පමණක් ගණනය කර ඒවා හැඹිලි කර තබමු. මෙම ගොනුවේ <a href="#shared_kv">දෙවන භාගය</a> මෙය ක්රියාත්මක කරයි. කාර්ය සාධනය වැඩි දියුණු කිරීම සඳහා අපි අභිරුචි PyTorch ශ්රිතයක් ක්රියාත්මක කළෙමු. </p>
<p>කුඩාෂේක්ස්පියර් දත්ත කට්ටලය පිළිබඳ ප්රතිපෝෂණ ට්රාන්ස්ෆෝමරයක් පුහුණු කිරීම සඳහා පුහුණු <a href="experiment.html">කේතය</a> සහ සටහන් පොතක් මෙන්න. </p>
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://app.labml.ai/run/d8eb9416530a11eb8fb50242ac1c0002"> <img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a></p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">43</span><span></span><span class="kn">import</span> <span class="nn">math</span>
<span class="lineno">44</span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span>
<span class="lineno">45</span>
<span class="lineno">46</span><span class="kn">import</span> <span class="nn">torch</span>
<span class="lineno">47</span><span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span>
<span class="lineno">48</span>
<span class="lineno">49</span><span class="kn">from</span> <span class="nn">labml_helpers.module</span> <span class="kn">import</span> <span class="n">Module</span>
<span class="lineno">50</span><span class="kn">from</span> <span class="nn">labml_nn.transformers.feed_forward</span> <span class="kn">import</span> <span class="n">FeedForward</span>
<span class="lineno">51</span><span class="kn">from</span> <span class="nn">labml_nn.transformers.mha</span> <span class="kn">import</span> <span class="n">PrepareForMultiHeadAttention</span>
<span class="lineno">52</span><span class="kn">from</span> <span class="nn">labml_nn.utils</span> <span class="kn">import</span> <span class="n">clone_module_list</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-1'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-1'>#</a>
            </div>
            <h2>ප්රතිපෝෂණඅවධානය</h2>
<p>මෙමමොඩියුලය මුල් ට්රාන්ස්ෆෝමර් කඩදාසි වලින් අවධානයට සමාන පුනරාවර්තන අවධානයක් ගණනය කරයි. </p>
<p><span ><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mop"><span class="mord mathnormal">A</span><span class="mord mathnormal">tt</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span></span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:3.0000299999999998em;vertical-align:-1.25003em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6944399999999998em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">se</span><span class="mord mathnormal mtight" style="margin-right:0.03588em;">q</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mop"><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span></span><span class="mord"><span class="delimsizing size4">(</span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.5261079999999998em;"><span style="top:-2.25278em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.85722em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.81722em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg height="1.08em" preserveaspectratio="xMinYMin slice" viewbox="0 0 400000 1080" width="400em" xmlns="http://www.w3.org/2000/svg"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.18278000000000005em;"><span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">Q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="delimsizing size4">)</span></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span></span></span></span></span></span></p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">55</span><span class="k">class</span> <span class="nc">FeedbackAttention</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-2'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-2'>#</a>
            </div>
            <ul><li>'හෙඩ්ස්'යනු අවධානය යොමු කරන හිස් සංඛ්යාවකි </li>
<li><code  class="highlight"><span></span><span class="n">d_model</span></code>
 ට්රාන්ස්ෆෝමරයේ ඇති ලක්ෂණ ගණන </li>
<li><code  class="highlight"><span></span><span class="n">dropout_prob</span></code>
 අවධානය යොමු කිරීමේ සම්භාවිතාව </li>
<li><code  class="highlight"><span></span><span class="n">is_kv_precomputed</span></code>
 යතුරද යන්න, අගය ආතතීන් දැනටමත් ගණනය කර ඇත</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">66</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">heads</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">dropout_prob</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">67</span>                 <span class="n">is_kv_precomputed</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-3'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-3'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">75</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-4'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-4'>#</a>
            </div>
            <p>හිසකටවිශේෂාංග ගණන </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">78</span>        <span class="bp">self</span><span class="o">.</span><span class="n">d_k</span> <span class="o">=</span> <span class="n">d_model</span> <span class="o">//</span> <span class="n">heads</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-5'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-5'>#</a>
            </div>
            <p> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">80</span>        <span class="bp">self</span><span class="o">.</span><span class="n">heads</span> <span class="o">=</span> <span class="n">heads</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-6'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-6'>#</a>
            </div>
            <p>මේවා <code  class="highlight"><span></span><span class="n">query</span></code>
 බහු-ශීර්ෂ අවධානය පරිවර්තනය කරයි. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">83</span>        <span class="bp">self</span><span class="o">.</span><span class="n">query</span> <span class="o">=</span> <span class="n">PrepareForMultiHeadAttention</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_k</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-7'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-7'>#</a>
            </div>
            <p>මේවාබහු ශීර්ෂ අවධානය <code  class="highlight"><span></span><span class="n">value</span></code>
 සඳහා පරිවර්තනය කරයි. <code  class="highlight"><span></span><span class="n">key</span></code>
 </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">85</span>        <span class="k">if</span> <span class="ow">not</span> <span class="n">is_kv_precomputed</span><span class="p">:</span>
<span class="lineno">86</span>            <span class="bp">self</span><span class="o">.</span><span class="n">key</span> <span class="o">=</span> <span class="n">PrepareForMultiHeadAttention</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_k</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="lineno">87</span>            <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">PrepareForMultiHeadAttention</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_k</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-8'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-8'>#</a>
            </div>
            <p>යතුරුසහ අගයන් දැනටමත් ගණනය කර ඇත </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">89</span>        <span class="k">else</span><span class="p">:</span>
<span class="lineno">90</span>            <span class="bp">self</span><span class="o">.</span><span class="n">key</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">91</span>            <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="kc">None</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-9'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-9'>#</a>
            </div>
            <p>ප්රතිදානස්ථරය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">94</span>        <span class="bp">self</span><span class="o">.</span><span class="n">output</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">d_model</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-10'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-10'>#</a>
            </div>
            <p>හැලීම </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">96</span>        <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout_prob</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-11'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-11'>#</a>
            </div>
            <p>සොෆ්ට්මැක්ස්වලට පෙර පරිමාණ සාධකය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">98</span>        <span class="bp">self</span><span class="o">.</span><span class="n">scale</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">d_k</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-12'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-12'>#</a>
            </div>
            <p>කාලමානය ඔස්සේ අවධානය යොමු කිරීම සඳහා සොෆ්ට්මැක්ස් <code  class="highlight"><span></span><span class="n">key</span></code>
 </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">101</span>        <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Softmax</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-13'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-13'>#</a>
            </div>
            <p>සාපේක්ෂතනතුරු ගණන </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">104</span>        <span class="bp">self</span><span class="o">.</span><span class="n">P</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">**</span> <span class="mi">12</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-14'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-14'>#</a>
            </div>
            <p>විමසුමටසාපේක්ෂව යතුර සඳහා සාපේක්ෂ ස්ථානීය කාවැද්දීම්. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">107</span>        <span class="bp">self</span><span class="o">.</span><span class="n">key_pos_embeddings</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">P</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_k</span><span class="p">)),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-15'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-15'>#</a>
            </div>
            <p>විමසුමටසාපේක්ෂව යතුර සඳහා සාපේක්ෂ ස්ථානීය කාවැද්දීමේ නැඹුරුව. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">109</span>        <span class="bp">self</span><span class="o">.</span><span class="n">key_pos_bias</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">P</span><span class="p">,</span> <span class="n">heads</span><span class="p">)),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-16'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-16'>#</a>
            </div>
            <p>විමසුමසඳහා ස්ථානීය කාවැද්දීම් විමසුමේ පිහිටුමෙන් ස්වාධීන වේ </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">111</span>        <span class="bp">self</span><span class="o">.</span><span class="n">query_pos_bias</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">heads</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_k</span><span class="p">)),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-17'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-17'>#</a>
            </div>
            <p>අවශ්යනම් ලොග් වීම හෝ වෙනත් ගණනය කිරීම් සඳහා භාවිතා කළ හැකි වන පරිදි අපි අවධානය ගබඩා කරමු </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">114</span>        <span class="bp">self</span><span class="o">.</span><span class="n">attn</span> <span class="o">=</span> <span class="kc">None</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-18'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-18'>#</a>
            </div>
            <h3>අවධානයලකුණු ලබා ගන්න</h3>
<p>අපිඅවධානය සඳහා සාපේක්ෂ ස්ථානීය කේතීකරණ භාවිතා කරමු, <a href="../relative_mha.html">සාපේක්ෂ බහු-හිස අවධානය ආකෘති ට්රාන්ස්ෆෝමර්-එක්ස්එල් කඩදාසි</a>වලට සමානය. </p>
<p>පියවරප්රධාන වත්මන් පියවර ගේ විමසුම සිට අවධානය <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.85396em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqr" style=""><span class="mord mathnormal" style="margin-right:0.05724em">j</span></span></span></span></span></span> (වත්මන් පියවර සාපේක්ෂව) වේ,</p>
<span ><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:7.281318em;vertical-align:-3.3906589999999994em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.8906590000000003em;"><span style="top:-6.1218900000000005em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">A</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span><span style="top:-4.562782em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"></span></span><span style="top:-2.9805660000000005em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"></span></span><span style="top:-1.167119000000001em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:3.3906589999999994em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.8906590000000003em;"><span style="top:-6.1218900000000005em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mord mathnormal">Q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span><span style="top:-4.562782em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">i</span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.714392em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">q</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">i</span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.899108em;"><span style="top:-2.4530000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.383108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span><span style="top:-2.9805660000000005em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.891331em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">Q</span></span></span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord coloredeq eql" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.891331em;"><span style="top:-2.4530000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="margin-right:0.07153em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.383108em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span><span style="top:-1.167119000000001em;"><span class="pstrut" style="height:3.130339em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-2.069561em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="">A</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord"><span class="mord mathnormal">Q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.0304389999999999em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.899108em;"><span style="top:-1.972561em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="margin-right:0.05017em">B</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord"><span class="mord mathnormal">Q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord coloredeq eql" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.891331em;"><span style="top:-2.4530000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="margin-right:0.07153em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.383108em;"><span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.127439em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.1303389999999998em;"><span style="top:-2.1999em;margin-left:0em;"><span class="pstrut" style="height:3.130339em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="margin-right:0.07153em">C</span></span></span></span><span style="top:-3.130339em;"><span class="pstrut" style="height:3.130339em;"></span><span><span class="mop"><span class="mord"><span class="mord"><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.891331em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">Q</span></span></span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:1.130339em;"><span style="top:-3.344231em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.030439em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.130339em;"><span style="top:-2.1029em;margin-left:0em;"><span class="pstrut" style="height:3.130339em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="margin-right:0.02778em">D</span></span></span></span></span><span style="top:-3.130339em;"><span class="pstrut" style="height:3.130339em;"></span><span><span class="mop"><span class="mord"><span class="mord"><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.891331em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">Q</span></span></span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:1.130339em;"><span style="top:-3.344231em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord coloredeq eql" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.891331em;"><span style="top:-2.4530000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="margin-right:0.07153em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.383108em;"><span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.127439em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:3.3906589999999994em;"><span></span></span></span></span></span></span></span></span></span></span></span></span><p>මුල්කාවැද්දීම්වල රේඛීය පරිවර්තනයන් <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.236103em;vertical-align:-0.394772em;"></span><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">Q</span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eql" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-2.441336em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="margin-right:0.07153em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.394772em;"><span></span></span></span></span></span></span></span></span></span></span></span> වන <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2438799999999999em;vertical-align:-0.394772em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">q</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-2.441336em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.394772em;"><span></span></span></span></span></span></span></span></span></span></span> අතර ස්ථානීය කේතීකරණයේ රේඛීය පරිවර්තනයන් වේ <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.969438em;vertical-align:-0.286108em;"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span> <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.969438em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span>. </p>
<p>අපිපදය <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord" style="color:lightgreen"><span class="mord mathnormal" style="margin-right:0.02778em">D</span></span></span></span></span></span> වෙනුවට ආදේශ කරමු <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.969438em;vertical-align:-0.286108em;"></span><span class="mord coloredeq eqp" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.05764em">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span></span>. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">116</span>    <span class="k">def</span> <span class="nf">get_scores</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">query</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-19'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-19'>#</a>
            </div>
            <p><span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.236103em;vertical-align:-0.394772em;"></span><span class="mord coloredeq eql" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-2.441336em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="margin-right:0.07153em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.394772em;"><span></span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">144</span>        <span class="n">key_pos_emb</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">key_pos_embeddings</span><span class="p">[</span><span class="o">-</span><span class="n">key</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]:]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-20'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-20'>#</a>
            </div>
            <p><span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">Q</span></span></span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">146</span>        <span class="n">query_pos_bias</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">query_pos_bias</span><span class="p">[</span><span class="kc">None</span><span class="p">,</span> <span class="p">:,</span> <span class="p">:]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-21'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-21'>#</a>
            </div>
            <p><span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.969438em;vertical-align:-0.286108em;"></span><span class="mord coloredeq eqp" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.05764em">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">148</span>        <span class="n">key_pos_bias</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">key_pos_bias</span><span class="p">[</span><span class="o">-</span><span class="n">key</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]:]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-22'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-22'>#</a>
            </div>
            <p><span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.879547em;vertical-align:-1.0304389999999999em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8491080000000001em;"><span style="top:-2.069561em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="">A</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord"><span class="mord mathnormal">Q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.0304389999999999em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:2.110778em;vertical-align:-1.0304389999999999em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.080339em;"><span style="top:-2.1499em;margin-left:0em;"><span class="pstrut" style="height:3.080339em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="margin-right:0.07153em">C</span></span></span></span><span style="top:-3.080339em;"><span class="pstrut" style="height:3.080339em;"></span><span><span class="mop"><span class="mord"><span class="mord"><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">Q</span></span></span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:1.080339em;"><span style="top:-3.294231em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.0304389999999999em;"><span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">151</span>        <span class="n">ac</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s1">&#39;bhd,jbhd-&gt;jbh&#39;</span><span class="p">,</span> <span class="n">query</span> <span class="o">+</span> <span class="n">query_pos_bias</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-23'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-23'>#</a>
            </div>
            <p><span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.988211em;vertical-align:-1.1391029999999998em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8491080000000002em;"><span style="top:-1.9608970000000003em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="margin-right:0.05017em">B</span></span></span></span><span style="top:-3.0000000000000004em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord"><span class="mord mathnormal">Q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord coloredeq eql" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">U</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-2.441336em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="margin-right:0.07153em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.394772em;"><span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.1391029999999998em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.713769em;vertical-align:-1.0304389999999999em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6833300000000001em;"><span style="top:-2.069561em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="margin-right:0.02778em">D</span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord coloredeq eqp" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.05764em">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqr" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.0304389999999999em;"><span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">153</span>        <span class="n">bd</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s1">&#39;bhd,jhd-&gt;jbh&#39;</span><span class="p">,</span> <span class="n">query</span><span class="p">,</span> <span class="n">key_pos_emb</span><span class="p">)</span> <span class="o">+</span> <span class="n">key_pos_bias</span><span class="p">[:,</span> <span class="kc">None</span><span class="p">,</span> <span class="p">:]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-24'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-24'>#</a>
            </div>
            <p><span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.969438em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord mathnormal">A</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqr" style=""><span class="mord mathnormal mtight" style="margin-right:0.05724em">j</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">156</span>        <span class="k">return</span> <span class="n">ac</span> <span class="o">+</span> <span class="n">bd</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-25'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-25'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">query</span></code>
 හැඩය ඇත <code  class="highlight"><span></span><span class="p">[</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">]</span></code>
 </li>
<li><code  class="highlight"><span></span><span class="n">key</span></code>
 සහ හැඩය <code  class="highlight"><span></span><span class="n">value</span></code>
 ඇත <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">]</span></code>
</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">158</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">159</span>                <span class="n">query</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">160</span>                <span class="n">key</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">161</span>                <span class="n">value</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-26'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-26'>#</a>
            </div>
            <p>සූදානම්වන්න <code  class="highlight"><span></span><span class="n">query</span></code>
, <code  class="highlight"><span></span><span class="n">key</span></code>
 සහ අවධානය ගණනය කිරීම <code  class="highlight"><span></span><span class="n">value</span></code>
 සඳහා <code  class="highlight"><span></span><span class="n">key</span></code>
 සහ පසුව හැඩය <code  class="highlight"><span></span><span class="n">value</span></code>
 ඇත <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="n">d_k</span><span class="p">]</span></code>
 සහ හැඩය <code  class="highlight"><span></span><span class="n">query</span></code>
 ඇත <code  class="highlight"><span></span><span class="p">[</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="n">d_k</span><span class="p">]</span></code>
 </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">170</span>        <span class="n">query</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
<span class="lineno">171</span>        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">key</span><span class="p">:</span>
<span class="lineno">172</span>            <span class="n">key</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">key</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="lineno">173</span>        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">:</span>
<span class="lineno">174</span>            <span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">(</span><span class="n">value</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-27'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-27'>#</a>
            </div>
            <p>අවධානයලකුණු ගණනය කරන්න. හැඩයේ ආතතියෙන් ප්රති Results ල <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">heads</span><span class="p">]</span></code>
 </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">178</span>        <span class="n">scores</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_scores</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-28'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-28'>#</a>
            </div>
            <p>පරිමාණලකුණු <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.383108em;vertical-align:-0.538em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.5864385em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8622307142857143em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mtight" style="padding-left:0.833em;"><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em;"><span style="top:-2.3487714285714287em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15122857142857138em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.8222307142857144em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em;"><svg height="1.08em" preserveaspectratio="xMinYMin slice" viewbox="0 0 400000 1080" width="400em" xmlns="http://www.w3.org/2000/svg"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.17776928571428574em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.538em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">181</span>        <span class="n">scores</span> <span class="o">*=</span> <span class="bp">self</span><span class="o">.</span><span class="n">scale</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-29'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-29'>#</a>
            </div>
            <p>සොෆ්ට්මැක්ස් </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">184</span>        <span class="n">attn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-30'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-30'>#</a>
            </div>
            <p>අතහැරදැමීම යොදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">187</span>        <span class="n">attn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">attn</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-31'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-31'>#</a>
            </div>
            <p>අගයන්අනුව ගුණ කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">190</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s2">&quot;jbh,jbhd-&gt;bhd&quot;</span><span class="p">,</span> <span class="n">attn</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-32'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-32'>#</a>
            </div>
            <p>බහුහිස් සංයුක්ත කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">193</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-33'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-33'>#</a>
            </div>
            <p>ප්රතිදානස්ථරය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">196</span>        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-34'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-34'>#</a>
            </div>
            <h2>ප්රතිපෝෂණට්රාන්ස්ෆෝමර් ස්ථරය</h2>
<p>මෙයප්රතිපෝෂණ ට්රාන්ස්ෆෝමරයේ තනි ට්රාන්ස්ෆෝමර් තට්ටුවක් ක්රියාත්මක කරයි. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">199</span><span class="k">class</span> <span class="nc">FeedbackTransformerLayer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-35'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-35'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">d_model</span></code>
 ට්රාන්ස්ෆෝමරයේ ඇති ලක්ෂණ ගණන </li>
<li><code  class="highlight"><span></span><span class="n">attn</span></code>
 යනු ප්රතිපෝෂණ අවධානය මොඩියුලයයි </li>
<li><code  class="highlight"><span></span><span class="n">feed_forward</span></code>
 ස්ථාන-wise ානවන්ත ආහාර ඉදිරි ස්ථරයයි </li>
<li><code  class="highlight"><span></span><span class="n">dropout_prob</span></code>
 යනු අවධානය සහ පෝෂණය කිරීමෙන් පසු ස්ථර අතහැර දැමීමේ සම්භාවිතාවයයි</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">206</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">207</span>                 <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="lineno">208</span>                 <span class="n">attn</span><span class="p">:</span> <span class="n">FeedbackAttention</span><span class="p">,</span>
<span class="lineno">209</span>                 <span class="n">feed_forward</span><span class="p">:</span> <span class="n">FeedForward</span><span class="p">,</span>
<span class="lineno">210</span>                 <span class="n">dropout_prob</span><span class="p">:</span> <span class="nb">float</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-36'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-36'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">217</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-37'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-37'>#</a>
            </div>
            <p>ට්රාන්ස්ෆෝමර්ප්රමාණය <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">219</span>        <span class="bp">self</span><span class="o">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">d_model</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-38'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-38'>#</a>
            </div>
            <p> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">221</span>        <span class="bp">self</span><span class="o">.</span><span class="n">attn</span> <span class="o">=</span> <span class="n">attn</span>
<span class="lineno">222</span>        <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span> <span class="o">=</span> <span class="n">feed_forward</span>
<span class="lineno">223</span>        <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout_prob</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-39'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-39'>#</a>
            </div>
            <p>සාමාන්යකරණයස්ථර </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">226</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm_self_attn</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span>
<span class="lineno">227</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-40'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-40'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">229</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">230</span>                <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">231</span>                <span class="n">key</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">],</span>
<span class="lineno">232</span>                <span class="n">value</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-41'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-41'>#</a>
            </div>
            <p>මතකයක්තිබේ නම් </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">234</span>        <span class="k">if</span> <span class="n">key</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-42'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-42'>#</a>
            </div>
            <p>ස්වයංඅවධානය යොමු කිරීමට පෙර දෛශික සාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">236</span>            <span class="n">z</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_self_attn</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-43'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-43'>#</a>
            </div>
            <p>ස්වයංඅවධානය හරහා ධාවනය කරන්න, i.e. යතුරු සහ වටිනාකම් ස්වයං සිට </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">238</span>            <span class="n">self_attn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">attn</span><span class="p">(</span><span class="n">query</span><span class="o">=</span><span class="n">z</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">value</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-44'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-44'>#</a>
            </div>
            <p>ස්වයංඅවධානය ප්රතිඵල එකතු </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">240</span>            <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">self_attn</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-45'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-45'>#</a>
            </div>
            <p>පෝෂණයසඳහා සාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">243</span>        <span class="n">z</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-46'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-46'>#</a>
            </div>
            <p>Feed-forwardජාලය හරහා ගමන් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">245</span>        <span class="n">ff</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span><span class="p">(</span><span class="n">z</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-47'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-47'>#</a>
            </div>
            <p>ප්රතිපෝෂණඉදිරි ප්රති results ල නැවත එක් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">247</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">ff</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-48'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-48'>#</a>
            </div>
            <p> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">250</span>        <span class="k">return</span> <span class="n">x</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-49'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-49'>#</a>
            </div>
            <h2>ප්රතිපෝෂණට්රාන්ස්ෆෝමර් මොඩියුලය</h2>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">253</span><span class="k">class</span> <span class="nc">FeedbackTransformer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-50'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-50'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">layer</span></code>
 අපි එක් එක් ස්ථරයක් සඳහා පරිගණක ක්රිඩාවට සමාන වන ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් ස්ථරය, වේ </li>
<li><code  class="highlight"><span></span><span class="n">n_layers</span></code>
 ට්රාන්ස්ෆෝමරයේ ස්ථර ගණන</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">258</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layer</span><span class="p">:</span> <span class="n">FeedbackTransformerLayer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-51'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-51'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">264</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-52'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-52'>#</a>
            </div>
            <p>ට්රාන්ස්ෆෝමර්ස්ථරයේ පිටපත් සාදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">266</span>        <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">clone_module_list</span><span class="p">(</span><span class="n">layer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-53'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-53'>#</a>
            </div>
            <p>අවසානසාමාන්යකරණ ස්තරය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">268</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">layer</span><span class="o">.</span><span class="n">size</span><span class="p">])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-54'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-54'>#</a>
            </div>
            <p>මතකදෛශික ගණනය කරනු ලබන්නේ එක් එක් ස්ථරයේ නිරූපණවල බර කිරන ලද එකතුවකි. ඒ සඳහා බර පරාමිතිය මෙයයි. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">271</span>        <span class="bp">self</span><span class="o">.</span><span class="n">weights</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_layers</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-55'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-55'>#</a>
            </div>
            <p>බරතැබූ මුදල ගැනීමට පෙර බර සඳහා සොෆ්ට්මැක්ස් </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">273</span>        <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Softmax</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-56'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-56'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">x_seq</span></code>
 හැඩය සහිත ආදානය වේ <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">]</span></code>
</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">275</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x_seq</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-57'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-57'>#</a>
            </div>
            <p>අනුක්රමිකඅක්ෂය දිගේ ලැයිස්තුවකට ආදානය බෙදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">281</span>        <span class="n">x_seq</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">unbind</span><span class="p">(</span><span class="n">x_seq</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-58'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-58'>#</a>
            </div>
            <p>ප්රතිදානයන්ගබඩා කිරීම සඳහා ලැයිස්තුව </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">283</span>        <span class="n">res</span> <span class="o">=</span> <span class="p">[]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-59'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-59'>#</a>
            </div>
            <p>මතකදෛශික ගබඩා කිරීමට ලැයිස්තුව </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">285</span>        <span class="n">mem</span> <span class="o">=</span> <span class="p">[]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-60'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-60'>#</a>
            </div>
            <p>එක්එක් ආදාන පියවර සඳහා </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">287</span>        <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">x_seq</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-61'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-61'>#</a>
            </div>
            <p>ස්ථරප්රතිදානයන් ගබඩා කිරීම සඳහා ලැයිස්තුව </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">289</span>            <span class="n">layer_outputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="p">]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-62'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-62'>#</a>
            </div>
            <p>මතකයක්තිබේ නම්, ඒවා දෛශිකයකට ගොඩගසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">292</span>            <span class="n">mem_tensor</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">mem</span><span class="p">)</span> <span class="k">if</span> <span class="n">mem</span> <span class="k">else</span> <span class="kc">None</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-63'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-63'>#</a>
            </div>
            <p>එක්එක් ස්ථරය හරහා ධාවනය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">295</span>            <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-64'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-64'>#</a>
            </div>
            <p>ස්ථරප්රතිදානය ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">297</span>                <span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">mem_tensor</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">mem_tensor</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-65'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-65'>#</a>
            </div>
            <p>ස්ථරප්රතිදානයන් ලැයිස්තුවට ඒවා එකතු කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">299</span>                <span class="n">layer_outputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-66'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-66'>#</a>
            </div>
            <p>ස්ථරයේප්රතිදානයන් ටෙන්සරයකට ගොඩගසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">302</span>            <span class="n">layer_outputs</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">layer_outputs</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-67'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-67'>#</a>
            </div>
            <p>ස්ථරප්රතිදානවල බර තැබූ එකතුවක් ලෙස මතක දෛශිකය ගණනය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">304</span>            <span class="n">mem</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s1">&#39;lbd,l-&gt;bd&#39;</span><span class="p">,</span> <span class="n">layer_outputs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">weights</span><span class="p">)))</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-68'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-68'>#</a>
            </div>
            <p>ප්රති. ල සඳහා ප්රතිදානය එක් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">306</span>            <span class="n">res</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-69'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-69'>#</a>
            </div>
            <p>නිමැවුම්ආතතීන් ගොඩගසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">309</span>        <span class="n">res</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">res</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-70'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-70'>#</a>
            </div>
            <p>ප්රතිදානයසාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">311</span>        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">res</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-71'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-71'>#</a>
            </div>
            <p><a id="shared_kv"></a></p>
<h1>ස්ථරඅතර බෙදාගත් යතුරු සහ අගයන්</h1>

        </div>
        <div class='code'>
            <div class="highlight"><pre></pre></div>
        </div>
    </div>
    <div class='section' id='section-72'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-72'>#</a>
            </div>
            <h3>සිරස්ක්රියාකාරිත්වය ක්රියාත්මක කිරීම</h3>
<p>අපිpython ලැයිස්තුවකට appending හා පසුව කරන්නේ වෙනුවට අභිරුචි ශ්රිතයක් ක්රියාත්මක <code  class="highlight"><span></span><span class="n">torch</span><span class="o">.</span><span class="n">stack</span></code>
. මෙම බොහෝ සෙයින් අනුක්රමය ඔස්සේ එක් එක් පියවර <code  class="highlight"><span></span><span class="n">torch</span><span class="o">.</span><span class="n">stack</span></code>
 දී ඉල්ලා පුරා කාර්ය සාධනය වැඩි දියුණු කරන ලදි. සෑම විටම කැඳවනු <code  class="highlight"><span></span><span class="n">torch</span><span class="o">.</span><span class="n">stack</span></code>
 ලැබේ, එය නව ආතතියක් නිර්මාණය කරයි, මෙම ක්රමය සහ ඒ සමඟ ඇති පන්ති <code  class="highlight"><span></span><span class="n">Stack</span></code>
 කොටස් මතකය එක් එක් පියවර සඳහා. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">318</span><span class="k">class</span> <span class="nc">StackFunction</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">autograd</span><span class="o">.</span><span class="n">Function</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-73'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-73'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">ctx</span></code>
 යනු ශ්රිතයේ සන්දර්භය (එය අපට හැඹිලි දේවල් වලට ඉඩ දෙයි) </li>
<li><code  class="highlight"><span></span><span class="n">memory</span></code>
 යනු හවුල් මතක ටෙන්සරය වන අතර එහිදී අපි එක් එක් පියවරේ අගයන් ගබඩා කර ගබඩා කරමු (යතුරු සහ අගයන්) </li>
<li><code  class="highlight"><span></span><span class="n">memory_grad</span></code>
 යනු එක් එක් පියවරේ අනුක්රමික ගබඩා කිරීම හා රැස් කිරීම සඳහා හවුල් මතක ආතතියකි </li>
<li><code  class="highlight"><span></span><span class="n">last</span></code>
 අවසාන අගය ගොඩගැසී ඇත </li>
<li><code  class="highlight"><span></span><span class="n">n</span></code>
 පියවර ගණන (එනම් අඩුක්කුව ප්රමාණය)</li></ul>
<p>මෙයපියවර සඳහා ගොඩගැසී ඇති ටෙන්සරය නැවත ලබා දෙයි <code  class="highlight"><span></span><span class="n">n</span></code>
. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">330</span>    <span class="nd">@staticmethod</span>
<span class="lineno">331</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">memory</span><span class="p">,</span> <span class="n">memory_grad</span><span class="p">,</span> <span class="n">last</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-74'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-74'>#</a>
            </div>
            <p>හැඹිලිසමුච්චිත අනුක්රමික </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">343</span>        <span class="n">ctx</span><span class="o">.</span><span class="n">_mem_grad</span> <span class="o">=</span> <span class="n">memory_grad</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-75'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-75'>#</a>
            </div>
            <p>තොගයේප්රමාණය හැඹිලිය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">345</span>        <span class="n">ctx</span><span class="o">.</span><span class="n">_n</span> <span class="o">=</span> <span class="n">n</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-76'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-76'>#</a>
            </div>
            <p>තොගයආපසු දෙන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">347</span>        <span class="k">return</span> <span class="n">memory</span><span class="p">[:</span><span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-77'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-77'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">grad_output</span></code>
 <code  class="highlight"><span></span><span class="n">forward</span></code>
 ශ්රිතයේ ප්රතිදානය සම්බන්ධයෙන් ශ්රේණිය වේ</li></ul>
<p>මෙයහවුල් මතක ටෙන්සරයේ ඇති අනුක්රමික සමුච්චය වන අතර තොගයේ <code  class="highlight"><span></span><span class="n">last</span></code>
 ප්රති result ලය සම්බන්ධයෙන් අනුක්රමික ආපසු ලබා දෙන්න. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">349</span>    <span class="nd">@staticmethod</span>
<span class="lineno">350</span>    <span class="k">def</span> <span class="nf">backward</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">grad_output</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-78'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-78'>#</a>
            </div>
            <p>තොගයේවත්මන් ප්රමාණය ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">358</span>        <span class="n">n</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">_n</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-79'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-79'>#</a>
            </div>
            <p>සමුච්චිතඅනුක්රමික ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">360</span>        <span class="n">memory_grad</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">_mem_grad</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-80'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-80'>#</a>
            </div>
            <p>අනුක්රමිකඑකතු කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">362</span>        <span class="n">memory_grad</span><span class="p">[:</span><span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">+=</span> <span class="n">grad_output</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-81'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-81'>#</a>
            </div>
            <p>W.r.tඅනුක්රමික තොගයේ අවසාන අගය වෙත ආපසු ලබා දෙන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">364</span>        <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="n">memory_grad</span><span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kc">None</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-82'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-82'>#</a>
            </div>
            <h3>සිරස්මොඩියුලය</h3>
<p>මෙයඉහත අර්ථ දක්වා ඇති සිරස් ශ්රිතය භාවිතා කරන අතර අවශ්ය ආරම්භකකරණයන් සිදු කරයි. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">367</span><span class="k">class</span> <span class="nc">Stack</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-83'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-83'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">max_len</span></code>
 අඩුක්කුව උපරිම ප්රමාණය</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">374</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">max_len</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-84'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-84'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">378</span>        <span class="bp">self</span><span class="o">.</span><span class="n">max_len</span> <span class="o">=</span> <span class="n">max_len</span>
<span class="lineno">379</span>        <span class="bp">self</span><span class="o">.</span><span class="n">memory</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">380</span>        <span class="bp">self</span><span class="o">.</span><span class="n">memory_grad</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">381</span>        <span class="bp">self</span><span class="o">.</span><span class="n">last</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">382</span>        <span class="bp">self</span><span class="o">.</span><span class="n">n</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="lineno">383</span>        <span class="bp">self</span><span class="o">.</span><span class="n">last_get_n</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-85'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-85'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">n</span></code>
 අඩුක්කුව ප්රමාණය වේ </li>
<li><code  class="highlight"><span></span><span class="n">value</span></code>
 අඩුක්කුව එකතු කළ යුතු බව tensor වේ</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">385</span>    <span class="k">def</span> <span class="nf">append</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">value</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-86'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-86'>#</a>
            </div>
            <p>අගයක්එකතු කිරීමෙන් පසු ඔබට අඩුක්කුව ලබා ගත යුතුය (භාවිතා කරන්න). එසේ නොමැතිනම් මෙම ක්රියාත්මක කිරීම අසමත් වේ </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">393</span>        <span class="k">assert</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">or</span> <span class="bp">self</span><span class="o">.</span><span class="n">last_get_n</span> <span class="o">==</span> <span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">last_get_n</span><span class="si">}</span><span class="s2">&quot;</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-87'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-87'>#</a>
            </div>
            <p>අනුක්රමිකනොමැතිව මෙය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">396</span>        <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-88'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-88'>#</a>
            </div>
            <p>තොගයතබා ගැනීම සඳහා හවුල් මතක ටෙන්සරය ආරම්භ කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">398</span>            <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">memory</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="bp">self</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="o">!=</span> <span class="n">value</span><span class="o">.</span><span class="n">shape</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-89'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-89'>#</a>
            </div>
            <p>මෙයසිදුවිය යුත්තේ තොගය හිස් වූ විට පමණි </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">400</span>                <span class="k">assert</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-90'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-90'>#</a>
            </div>
            <p>අඩුක්කුවසඳහා ටෙන්සරයක් සාදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">402</span>                <span class="bp">self</span><span class="o">.</span><span class="n">memory</span> <span class="o">=</span> <span class="n">value</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">max_len</span><span class="p">,</span> <span class="o">*</span><span class="n">value</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-91'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-91'>#</a>
            </div>
            <p>අනුක්රමිකසමුච්චය කිරීමට tensor සාදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">404</span>                <span class="bp">self</span><span class="o">.</span><span class="n">memory_grad</span> <span class="o">=</span> <span class="n">value</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-92'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-92'>#</a>
            </div>
            <p>මතකයදැනටමත් ආරම්භ කර ඇති නමුත් අපි තොගය නැවත සකසමින් සිටිමු. </p>
<p>මෙයතවත් කාර්යයක් විය හැකිය <code  class="highlight"><span></span><span class="n">reset</span></code>
, නමුත් මෙය භාවිතා කිරීම පහසු බව අපට පෙනී ගියේය. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">409</span>            <span class="k">elif</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-93'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-93'>#</a>
            </div>
            <p>සමුච්චිතඅනුක්රමික නැවත සකසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">411</span>                <span class="bp">self</span><span class="o">.</span><span class="n">memory_grad</span><span class="o">.</span><span class="n">fill_</span><span class="p">(</span><span class="mf">0.</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-94'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-94'>#</a>
            </div>
            <p>තොගයේනිවැරදි ස්ථානයේ වටිනාකම සකසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">414</span>            <span class="bp">self</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">data</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-95'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-95'>#</a>
            </div>
            <p>(නිදොස්කරණය සඳහා) අඩුක්කුව පිළිබඳ වාර්තාවක් තබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">416</span>            <span class="bp">self</span><span class="o">.</span><span class="n">n</span> <span class="o">=</span> <span class="n">n</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-96'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-96'>#</a>
            </div>
            <p>තොගයටඑකතු කරන ලද අවසාන අගය පිළිබඳ වාර්තාවක් තබා ගන්න. ආපස්සට ප්රචාරය කිරීම සඳහා අපට මෙය සම්මත කර ගත යුතුය. <code  class="highlight"><span></span><span class="n">StackFunction</span></code>
 </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">421</span>        <span class="bp">self</span><span class="o">.</span><span class="n">last</span> <span class="o">=</span> <span class="n">value</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-97'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-97'>#</a>
            </div>
            <p> තොගයආපසු ලබා දෙයි</p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">423</span>    <span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-98'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-98'>#</a>
            </div>
            <p>එයභාවිතා කරන විට අඩුක්කුව ප්රමාණය පිළිබඳ වාර්තාවක් තබා ගන්න. මෙය සනීපාරක්ෂක පරීක්ෂණයක් සඳහා භාවිතා වේ <code  class="highlight"><span></span><span class="n">append</span></code>
. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">430</span>        <span class="bp">self</span><span class="o">.</span><span class="n">last_get_n</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">n</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-99'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-99'>#</a>
            </div>
            <p>පසුපසටප්රචාරණය කිරීමේදී PyTorch විසින් කැඳවනු ලබන <code  class="highlight"><span></span><span class="n">StackFunction</span></code>
 ඒ <code  class="highlight"><span></span><span class="n">StackFunction</span><span class="o">.</span><span class="n">backwards</span></code>
 සියල්ල හරහා ගන්න. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">433</span>        <span class="k">return</span> <span class="n">StackFunction</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">memory</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">memory_grad</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">last</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">n</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-100'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-100'>#</a>
            </div>
            <p> මතකයමුදා හැරීමට</p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">435</span>    <span class="k">def</span> <span class="nf">free</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-101'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-101'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">440</span>        <span class="bp">self</span><span class="o">.</span><span class="n">memory</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">441</span>        <span class="bp">self</span><span class="o">.</span><span class="n">memory_grad</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">442</span>        <span class="bp">self</span><span class="o">.</span><span class="n">last</span> <span class="o">=</span> <span class="kc">None</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-102'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-102'>#</a>
            </div>
            <h2>යාවත්කාලීනකරන ලද ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් මොඩියුලය</h2>
<p>යතුරුසහ අගයන් හැඹිලි කරන යාවත්කාලීන කරන ලද ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් මොඩියුලය මෙයයි. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">445</span><span class="k">class</span> <span class="nc">FeedbackTransformerKV</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-103'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-103'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">layer</span></code>
 අපි එක් එක් ස්ථරයක් සඳහා පරිගණක ක්රිඩාවට සමාන වන ප්රතිපෝෂණ ට්රාන්ස්ෆෝමර් ස්ථරය, වේ </li>
<li><code  class="highlight"><span></span><span class="n">n_layers</span></code>
 ට්රාන්ස්ෆෝමරයේ ස්ථර ගණන </li>
<li><code  class="highlight"><span></span><span class="n">d_model</span></code>
 ට්රාන්ස්ෆෝමරයේ ඇති ලක්ෂණ ගණන </li>
<li>'හෙඩ්ස්'යනු අවධානය යොමු කරන හිස් සංඛ්යාවකි</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">452</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layer</span><span class="p">:</span> <span class="n">FeedbackTransformerLayer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">heads</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-104'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-104'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">460</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-105'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-105'>#</a>
            </div>
            <p>ට්රාන්ස්ෆෝමර්ස්ථරයේ පිටපත් සාදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">462</span>        <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">clone_module_list</span><span class="p">(</span><span class="n">layer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-106'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-106'>#</a>
            </div>
            <p>අවසානසාමාන්යකරණ ස්තරය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">464</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">layer</span><span class="o">.</span><span class="n">size</span><span class="p">])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-107'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-107'>#</a>
            </div>
            <p>මතකදෛශික ගණනය කරනු ලබන්නේ එක් එක් ස්ථරයේ නිරූපණවල බර කිරන ලද එකතුවකි. ඒ සඳහා බර පරාමිතිය මෙයයි. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">467</span>        <span class="bp">self</span><span class="o">.</span><span class="n">weights</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_layers</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-108'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-108'>#</a>
            </div>
            <p>බරතැබූ මුදල ගැනීමට පෙර බර සඳහා සොෆ්ට්මැක්ස් </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">469</span>        <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Softmax</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-109'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-109'>#</a>
            </div>
            <p>හිසෙහිවිශේෂාංග ගණන </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">472</span>        <span class="n">d_k</span> <span class="o">=</span> <span class="n">d_model</span> <span class="o">//</span> <span class="n">heads</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-110'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-110'>#</a>
            </div>
            <p>යතුරුලබා ගැනීම සඳහා කාවැද්දීම් (මතකය) පරිවර්තනය කිරීමේ මොඩියුලය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">474</span>        <span class="bp">self</span><span class="o">.</span><span class="n">key</span> <span class="o">=</span> <span class="n">PrepareForMultiHeadAttention</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="n">d_k</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-111'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-111'>#</a>
            </div>
            <p>යතුරුලබා ගැනීම සඳහා කාවැද්දීම් (මතකය) පරිවර්තනය කිරීමේ මොඩියුලය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">476</span>        <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">PrepareForMultiHeadAttention</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="n">d_k</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-112'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-112'>#</a>
            </div>
            <p>සිරස්වඇති යතුරු සඳහා මතකය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">479</span>        <span class="bp">self</span><span class="o">.</span><span class="n">mem_key</span> <span class="o">=</span> <span class="n">Stack</span><span class="p">(</span><span class="mi">512</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-113'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-113'>#</a>
            </div>
            <p>සිරස්වඇති අගයන් සඳහා මතකය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">481</span>        <span class="bp">self</span><span class="o">.</span><span class="n">mem_value</span> <span class="o">=</span> <span class="n">Stack</span><span class="p">(</span><span class="mi">512</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-114'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-114'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">x_seq</span></code>
 හැඩය සහිත ආදානය වේ <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">]</span></code>
</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">483</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x_seq</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-115'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-115'>#</a>
            </div>
            <p>අනුක්රමිකඅක්ෂය දිගේ ලැයිස්තුවකට ආදානය බෙදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">489</span>        <span class="n">x_seq</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">unbind</span><span class="p">(</span><span class="n">x_seq</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-116'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-116'>#</a>
            </div>
            <p>ප්රතිදානයන්ගබඩා කිරීම සඳහා ලැයිස්තුව </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">491</span>        <span class="n">res</span> <span class="o">=</span> <span class="p">[]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-117'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-117'>#</a>
            </div>
            <p>එක්එක් ආදාන පියවර සඳහා </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">493</span>        <span class="k">for</span> <span class="n">step</span><span class="p">,</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">x_seq</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-118'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-118'>#</a>
            </div>
            <p>ස්ථරප්රතිදානයන් ගබඩා කිරීම සඳහා ලැයිස්තුව </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">495</span>            <span class="n">layer_outputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="p">]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-119'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-119'>#</a>
            </div>
            <p>යතුරුසහ වටිනාකම් තොගයක් </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">498</span>            <span class="n">key_tensor</span> <span class="o">=</span> <span class="kc">None</span>
<span class="lineno">499</span>            <span class="n">value_tensor</span> <span class="o">=</span> <span class="kc">None</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-120'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-120'>#</a>
            </div>
            <p>අපිආරම්භක පියවර ඔබ්බට නම් යතුරු සහ අගයන් tensors ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">501</span>            <span class="k">if</span> <span class="n">step</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
<span class="lineno">502</span>                <span class="n">key_tensor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">mem_key</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
<span class="lineno">503</span>                <span class="n">value_tensor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">mem_value</span><span class="o">.</span><span class="n">get</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-121'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-121'>#</a>
            </div>
            <p>එක්එක් ස්ථරය හරහා ධාවනය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">506</span>            <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-122'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-122'>#</a>
            </div>
            <p>ස්ථරප්රතිදානය ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">508</span>                <span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">key_tensor</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">value_tensor</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-123'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-123'>#</a>
            </div>
            <p>ස්ථරප්රතිදානයන් ලැයිස්තුවට ඒවා එකතු කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">510</span>                <span class="n">layer_outputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-124'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-124'>#</a>
            </div>
            <p>ස්ථරයේප්රතිදානයන් ටෙන්සරයකට ගොඩගසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">513</span>            <span class="n">layer_outputs</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">layer_outputs</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-125'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-125'>#</a>
            </div>
            <p>ස්ථරප්රතිදානවල බර තැබූ එකතුවක් ලෙස මතක දෛශිකය ගණනය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">515</span>            <span class="n">mem</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s1">&#39;lbd,l-&gt;bd&#39;</span><span class="p">,</span> <span class="n">layer_outputs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">weights</span><span class="p">))</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-126'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-126'>#</a>
            </div>
            <p>මතකයෙන්යතුරු ගණනය කර එය තොගයට එක් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">517</span>            <span class="bp">self</span><span class="o">.</span><span class="n">mem_key</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">key</span><span class="p">(</span><span class="n">mem</span><span class="p">))</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-127'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-127'>#</a>
            </div>
            <p>මතකයෙන්අගයන් ගණනය කර එය තොගයට එක් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">519</span>            <span class="bp">self</span><span class="o">.</span><span class="n">mem_value</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">(</span><span class="n">mem</span><span class="p">))</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-128'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-128'>#</a>
            </div>
            <p>ප්රති. ල සඳහා ප්රතිදානය එක් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">521</span>            <span class="n">res</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-129'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-129'>#</a>
            </div>
            <p>නිමැවුම්ආතතීන් ගොඩගසන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">524</span>        <span class="n">res</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">res</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-130'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-130'>#</a>
            </div>
            <p>ප්රතිදානයසාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">526</span>        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">res</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-131'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-131'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">528</span>    <span class="k">def</span> <span class="nf">free</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="lineno">529</span>        <span class="bp">self</span><span class="o">.</span><span class="n">mem_key</span><span class="o">.</span><span class="n">free</span><span class="p">()</span>
<span class="lineno">530</span>        <span class="bp">self</span><span class="o">.</span><span class="n">mem_value</span><span class="o">.</span><span class="n">free</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='footer'>
        <a href="https://papers.labml.ai">Trending Research Papers</a>
        <a href="https://labml.ai">labml.ai</a>
    </div>
</div>
<script src=../../interactive.js?v=1"></script>
<script>
    function handleImages() {
        var images = document.querySelectorAll('p>img')

        for (var i = 0; i < images.length; ++i) {
            handleImage(images[i])
        }
    }

    function handleImage(img) {
        img.parentElement.style.textAlign = 'center'

        var modal = document.createElement('div')
        modal.id = 'modal'

        var modalContent = document.createElement('div')
        modal.appendChild(modalContent)

        var modalImage = document.createElement('img')
        modalContent.appendChild(modalImage)

        var span = document.createElement('span')
        span.classList.add('close')
        span.textContent = 'x'
        modal.appendChild(span)

        img.onclick = function () {
            console.log('clicked')
            document.body.appendChild(modal)
            modalImage.src = img.src
        }

        span.onclick = function () {
            document.body.removeChild(modal)
        }
    }

    handleImages()
</script>
</body>
</html>